Continuous Time Markov Decision Processes with Expected Discounted Total Rewards
نویسندگان
چکیده
Abstract. This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function is positive infinity, negative infinity, or finite, respectively. Correspondingly, the model is reduced into three submodels, by generalizing policies and eliminating some worst actions. Then for the submodel with finite optimal value, the validity of the optimality equation is shown and some its properties are obtained.
منابع مشابه
Total Expected Discounted Reward MDPs: Existence of Optimal Policies
This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.
متن کاملNecessary Conditions for Continuous Time Markov Decision Processes with Expected Discounted Total Rewards
This paper discusses a set of necessary conditions for continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is any real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which t...
متن کاملContinuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach
This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized stra...
متن کاملSufficiency of Markov Policies for Continuous-Time Markov Decision Processes and Solutions of Forward Kolmogorov Equation for Jump Markov Processes
In continuous-time Markov decision processes (CTMDPs) with Borel state and action spaces, unbounded transition rates, for an arbitrary policy, we construct a relaxed Markov policy such that the marginal distribution on the stateaction pairs at any time instant is the same for both the policies. This result implies the existence of a relaxed Markov policy that performs equally to an arbitrary po...
متن کاملChapter for MARKOV DECISION PROCESSES
Mixed criteria are linear combinations of standard criteria which cannot be represented as standard criteria. Linear combinations of total discounted and average rewards as well as linear combinations of total discounted rewards are examples of mixed criteria. We discuss the structure of optimal policies and algorithms for their computation for problems with and without constraints.
متن کامل